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Abstract 

This  paper  describes  the  medical  information 
retrieval  (MIR)  systems  designed  by  the  Uni¬ 
versity  of  Texas  at  Dallas  (UTD)  for  clinical 
decision  support  (CDS)  which  were  submitted 
to  the  TREC  2014.  We  investigated  the  impact 
of  various  knowledge  bases  for  automatic  query 
expansion  in  the  four  officially  submitted  runs. 

Each  of  these  systems  exploits  both  Wikipedia 
and  PubMed  corpus  statistics  in  order  to  au¬ 
tomatically  extract  keywords.  Extracted  key¬ 
words  were  then  expanded  by  relying  on  struc¬ 
tured  medical  knowledge  bases,  such  as  the 
Unified  Medical  Language  System  (UMLS),  the 
Systemized  Nomenclature  of  Medicine  -  Clini¬ 
cal  Terms  (SNOMED-CT),  and  Wikipedia  as 
well  as  unsupervised  distributional  representa¬ 
tions  based  Google’s  Word2Vec  deep  learning 
architecture.  Our  highest  scoring  submission 
achieved  an  inferred  AP  score  of  0.056  and  an 
inferred  NDCG  score  of  0.205. 

1  Introduction 

In  order  to  make  biomedical  information  more  ac¬ 
cessible  to  physicians,  and  to  anticipate  their  med¬ 
ical  information  needs,  the  Text  REtrieval  Confer¬ 
ence  (TREC)  initiated  the  Clinical  Decision  Support 
(CDS)  track  which  aimed  to  simulate  the  require¬ 
ments  of  Medical  Information  Retrieval  (MIR)  sys¬ 
tems  and  to  encourage  the  creation  of  tools  and  re¬ 


sources  necessary  for  their  implementation.  With  this 
goal  in  mind,  the  focus  of  the  2014  track  was  the 
retrieval  of  biomedical  articles  relevant  for  answer¬ 
ing  generic  clinical  questions  about  medical  “topics” . 
The  medical  topics  for  the  TREC-CDS  2014  track 
were  medical  case  narratives  authored  by  experts  at 
the  U.S.  National  Library  of  Medicine  that  served  as 
idealized  representations  of  actual  medical  records. 
These  medical  case  reports  consist  of  both  a  well- 
formed  narrative  summarizing  the  portions  of  a  pa¬ 
tient’s  medical  record  that  are  pertinent  to  the  case, 
as  well  as  one  of  three  possible  expected  answer  types. 
These  answer  types  refer  to  the  need  to  find  out  the 
diagnosis,  the  treatment  or  the  test  best  suited  for 
the  patient  described  in  the  medical  case.  An  exam¬ 
ple  of  a  medical  topic  evaluated  in  TREC-CDS  2014 
is  provided  in  Table  [I] 

2  The  Architecture 

The  MIR-CDS  systems  designed  for  the  TREC-CDS 
track  in  2014  used  the  open  access  subset  of  the 
PubMed  Centra^]  (PMC)  collection  of  scientific  arti¬ 
cles  as  retrieved  on  January  21,  2014.  This  collection 
contained  a  total  of  733,138  articles.  MIR-CDS  sys¬ 
tems  that  participated  in  the  TREC-MDS  2014  track 
were  challenged  with  retrieving  from  this  collection 
the  scientific  articles  which  were  relevant  to  a  given 

1PMC  is  an  online  digital  database  of  freely  available  full- 
text  biomedical  literature. 
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Figure  1:  System  Architecture  for  Medical  Information  Retrieval  (MIR)  Clinical  Decision  Support  (CDS) 


Topic  8  [Diagnosis] 

Description: 

A  62-year-old  man  sees  a  neurologist  for  progressive 
memory  loss  and  jerking  movements  of  the  lower  ex¬ 
tremities.  Neurologic  examination  confirms  severe  cog¬ 
nitive  deficits  and  memory  dysfunction.  An  electroen¬ 
cephalogram  shows  generalized  periodic  sharp  waves. 
Neuroimaging  studies  show  moderately  advanced  cere¬ 
bral  atrophy.  A  cortical  biopsy  shows  diffuse  vacuolar 
changes  of  the  gray  matter  with  reactive  astrocytosis 
but  no  inflammatory  infiltration. 

Summary: 

62-year-old  man  with  progressive  memory  loss  and  in¬ 
voluntary  leg  movements.  Brain  MRI  reveals  cortical 
atrophy,  and  cortical  biopsy  shows  vacuolar  gray  matter 
changes  with  reactive  astrocytosis. 

Table  1:  Example  of  a  medical  topic  used  in  the 
TREC-CDS  2014  dataset 


medical  topic.  Retrieved  articles  were  judged  rele¬ 
vant  if  they  provided  information  of  the  specified  se¬ 
mantic  answer  type  useful  for  the  given  medical  case 
description. 

In  this  paper,  we  describe  the  four  systems  we 
have  designed  for  the  MIR-CDS  evaluation  offered  by 
NIST  in  the  2014  TREC-CDS  track.  The  remainder 
of  this  paper  is  outlined  as  follows.  Section  2  docu¬ 
ments  the  system  architecture  of  the  MIR-CDS  sys¬ 
tems  we  have  developed.  Section  3  explains  how  we 


automatically  extract  keywords  from  a  given  medical 
topic,  Section  4  details  the  query  expansion  methods 
we  have  evaluated,  and  Section  5  illustrates  the  scien¬ 
tific  article  retrieval  models  we  have  used.  In  Section 
6,  we  present  the  experimental  evaluation  which  is 
then  analysed  in  Section  7.  Finally,  Section  8  sum¬ 
marizes  the  conclusions. 

3  System  Architecture 

The  design  of  our  MIR-CDS  system  benefits  from  our 
past  participation  in  the  2011  and  2012  TRECMed 
evaluations,  where  we  designed  a  medical  informa¬ 
tion  retrieval  system  for  patient  cohort  identification 
[5|  E]  •  As  illustrated  in  Figure  [lj  the  MIR-CDS  sys¬ 
tem  is  provided  with  (a)  a  medical  topic,  and  (b) 
a  collection  of  scientific  articles  and  it  is  expected  to 
return  a  ranked  list  of  scientific  articles  which  are  rel¬ 
evant  to  the  topic.  Given  the  medical  topic,  we  have 
designed  a  key-phrase  extraction  module  which  uses 
basic  syntactic  analysis  as  well  as  Wikipedia  infor¬ 
mation  to  discover  multi-word  medical  key-phrases. 
Examples  of  medical  key-phrases  extracted  by  this 
module  are:  “normocytic  anemia”  and  “coronary 
artery  aneurysm” .  We  then  considered  the  expansion 
of  medical  key-phrases  according  to  several  medical 
knowledge  bases  as  well  as  distributional  information. 
This  allowed  us  to  perform  scientific  article  retrieval 
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using  two  methods:  (1)  Lucene-based  Boolean  re¬ 
trieval,  and  (2)  distributional  vector  retrieval. 

4  Keyword  Extraction 

We  represent  a  given  medical  topic  by  transforming 
the  medical  case  description  into  an  unordered  set 
of  medical  key-phrases  (MKPs).  Our  process  for  ex¬ 
tracting  MKPs  consists  of  three  steps: 

1.  Find  in  the  medical  case  description  the  longest 
non-overlapping  sequences  of  words  which 
correspond  to  titles  of  articles  from  Wikipedia, 
denoted  as  MKIV 

2.  Syntactically  process  the  medical  case  descrip¬ 
tion  with  the  shallow  parser  available  from 
OpenNLP  [lj  and  derive  all  the  noun  phrases, 
generating  a  list  of  medical  key-phrases  MKP2 

3.  Combine  the  entries  in  MKPj  and  MKP2  creat¬ 
ing  MKPi+2  and  filter  it  by  removing  any  entry 
which  occurs  either  in  less  than  5  or  more  than 
60%  of  the  scientific  articles  from  the  TREC- 
CDS  dataset. 

This  allows  us  to  remove  rarely  occurring  or  highly 
common  medical  key-phrases  while  retaining  useful 
multiple-word  expressions  such  as  “coronary  artery 
aneurysm.” 


5  Query  Expansion 

In  written  text,  and  especially  in  medical  records, 
the  morphology  of  words  varies  significantly  both  be¬ 
tween  and  within  documents.  In  order  to  account 
for  these  variations  in  text,  we  store  each  extracted 
keyword  in  several  forms:  its  original  surface  form, 
a  WordNet  [3j  lemmatized  form,  an  unabbreviated 
form  (based  on  a  list  of  common  medical  abbrevi¬ 
ations),  a  form  in  which  hyphens  are  padded  with 
spaces,  a  form  with  all  hyphens  replaced  with  spaces, 
and  a  form  with  all  punctuation  removed.  These 


forms  are  used  as  exact  synonyms  for  the  purposes 
of  keyword  expansion  and  retrieval. 

Unfortunately,  these  slight  variations  in  morphol¬ 
ogy  are  not  enough  to  capture  the  diverse  ways  in 
which  keywords  may  be  expressed  in  medical  records. 
Indeed,  the  topics  presented  in  this  task  often  re¬ 
quire  extensive  domain  knowledge  within  the  field  of 
medicine  to  be  properly  understood.  For  example,  a 
keyword  such  as  fatigue  may  be  referred  to  as  weari¬ 
ness,  lack  of  energy,  malaise,  or  feeling  tired.  These 
synonyms  all  denote  hearing  loss,  but  use  alternate 
phrasings.  In  our  approach,  we  consider  all  of  these 
semantically  related  words  as  ‘expansions’,  and  we 
refer  to  the  process  of  generating  them  as  query  ex¬ 
pansion.  The  following  subsections  describe  the  four 
methods  of  query  expansion  that  we  have  considered. 


UMLS  Expansions  of  fatigue ) 

weariness 
lack  of  energy 
fatigue  extreme 
fatigues 
time  tired 
tired  all  the  time 
lacking  in  energy 


UMLS  Expansion 

The  Unified  Medical  Language  System  (UMLS)  is 
a  resource  for  coordinating  health  and  medical  vo¬ 
cabularies  [3].  UMLS  contains  three  major  com¬ 
ponents:  the  “Metathesaurus”  (which  includes  data 
from  SNOMED,  RxNorm,  MeSH,  and  other  collec¬ 
tions),  the  “Semantic  Network”  which  provides  gen¬ 
eral  categories  and  relationships,  and  the  “SPECIAL¬ 
IST  Lexicon  and  Lexical  Tools”EI  In  the  UMLS 
Metathesaurus,  each  unique  medical  concept  is  asso¬ 
ciated  with  a  ’’concept  unique  identifier”  or  CUI  such 
that  all  entries  in  the  Metathesaurus  which  refer  to 
the  same  concept  share  the  same  CUI.  We  exploit  this 
information  by  expanding  each  key-phrase  so  that  it 

2UMLS  is  described  at  http://www.nlm.nih.gov/research/ 
umls  /  quickstart.  html. 
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includes  all  concepts  in  the  MetMetathesaurus  which 
share  the  same  CUI. 


Wikipedia  Expansions  of  fatigue 

fatique 

fatigue  failure 
tatt 

tired  all  the  time  syndrome 
chronic  fatigue 


Wikipedia  Redirect  Expansion 

Wikipedia  is  a  common  resource  for  natural  language 
processing  tasks.  The  English  version  is  comprised 
of  3,731,340  user-generated  articles  covering  almost 
any  notable  topicj^J  In  addition  to  articles,  Wikipedia 
also  contains  pages  called  redirects  which  do  not  con¬ 
tain  content  themselves,  but  rather  redirect  -  or  send 
the  user  to  another  article  (or  section  of  an  arti¬ 
cle).  Redirects  typically  embody  alternate  names, 
spellings,  forms,  closely  related  words,  alternately 
punctuated  or  encoded  forms,  less  specific  forms  in 
which  the  redirected  name  is  the  primary  topic,  or 
more  specific  forms  of  some  other  page.  Our  system 
uses  Wikipedia  redirect^] to  generate  synonyms,  and 
alternate  (or  mis-)  spellings  for  keywords.  We  do  this 
by  expanding  each  keyword  so  that  it  includes  all  ar¬ 
ticle  titles  that  redirect  (send  the  user)  to  the  same 
article  as  the  key-phrase. 

SNOMED  CT  Expansion 

The  National  Library  of  Medicine  provides  a  resource 
called  the  Systematized  Nomenclature  of  Medicine- 
Clinical  Terms  (SNOMED  CT)  0.  SNOMED  CT 
is  a  clinical  terminology  being  maintained  by  the  In¬ 
ternational  Health  Terminology  Standards  Develop¬ 
ment  Organisation  and  is  one  of  several  designated 
standards  used  in  United  States  Federal  Govern¬ 
ment  systems  for  the  electronic  exchange  of  medical 

liThe  number  of  articles  is  based  on  September  6,  2011. 

4Redirect  and  article  data  is  based  on  the  May  26,  2011 
English  Wikipedia  data  dump. 


recor  <M3  SNOMED  CT  catalogs  both  medical  con¬ 
cepts  and  various  relationships  between  them.  We 
leverage  these  relationships  by  expanding  a  medical 
key-phrase  so  as  to  include  all  SNOMED  CT  entries 
which  partake  in  the  child  side  of  an  IS_A  or  PART_OF 
relationship^]  with  the  key-phrase. 


SNOMED  Expansions  of  fatigue 

complaining  of  debility  and  malaise 
frailty 

heavy  feeling 
senile  asthenia 
tired 

sensation  of  heaviness  in  limbs 
psychogenic  fatigue 
feeling  tired 
attacks  of  weakness 
asthenia 


Word2Vec  Expansion 

In  addition  to  medical  ontologies,  we  have  also  con¬ 
sidered  the  role  of  unsupervised  distributional  se¬ 
mantics  for  query  expansion.  We  used  the  publicly 
available  Gensim  library  provided  in  [7]  to  train  a 
Word2Vec  distributional  word  representation  using 
the  TREC-CDS  scientific  article  corpus.  Word2Vec 
is  a  highly  sophisticated  deep-learning  neural  network 
architecture  which  learns  how  to  represent  words  as 
multi-dimensional  vectors.  The  model  operates  with¬ 
out  human  supervision  by  considering  the  textual 
context  surrounding  words.  These  contexts  may  be 
represented  in  a  variety  of  ways,  although  in  this  work 
we  utilize  the  Skip-gram  model  which  is  able  to  cap¬ 
ture  discontinuous  multi-word  sequences.  Word2Vec 
attempts  to  project  words  onto  a  multi-dimension 
vector  space  such  that  the  proximity  between  two 
vectors  indicates  the  semantic  “similarity”  between 
their  associated  words.  We  used  this  property  by 


sMore  information  on  SNOMED  CT  is  available  at 
http://www.nlm.nih.gov/research/umls/Snomed/snomed_main. 

6  We  consider  children  up  to  2  levels  (grandchildren)  from 
the  parent  concept. 
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Word2Vec  Expansions  of  fatigue 

tiredness 

malaise 

sickness 

instability 

neuropathy 


associated  each  key-phrase  with  its  vector  represen¬ 
tation  and  using  the  twenty  most  “similar”  (as  mea¬ 
sured  by  cosine  similarity)  words  in  the  vocabulary 
as  expansions. 

6  Scientific  Article  Retrieval 

Lucene-based  Scientific  Article  Retrieval 

We  constructed  an  index  of  the  scientific  articles  us¬ 
ing  Apache  LucencQ  When  representing  a  medical 
case  topic,  we  represented  each  key-phrase  as  a  dis¬ 
junctive  Boolean  query  containing  all  of  its  possible 
expansions.  This  allowed  us  to  represent  the  entire 
topic  by  combining  each  medical  key-phrase  query 
into  one  large  Boolean  query.  As  such,  we  deter¬ 
mined  the  relevance  of  a  scientific  article  to  a  medi¬ 
cal  topic  by  determining  the  BM25  score  achieved  for 
each  query  using  Apache  Lucene’s  implementation  of 
the  BM25  ranking  function  [5]. 

Distributional  Scientific  Article  Retrieval 

Latent  Dirichlet  Allocation  (LDA)  presents  a  way  to 
efficiently  learn  latent  topic  distributions  of  words  au¬ 
tomatically  from  a  dataset  [2|.  We  trained  an  LDA 
representation  on  the  scientific  article  collection  used 
for  TREC-CDS.  In  Figure[2j  we  illustrate  the  first  fif¬ 
teen  words  and  their  weights  in  some  of  the  inferred 
latent  topics.  We  then  used  our  trained  LDA  model 
to  represent  each  query  and  document  as  their  latent 
topic  distribution  vectors.  This  allowed  us  to  deter¬ 
mine  the  relevance  of  any  document  and  query  pair 

7  Apache  Lucene  is  an  open  source  high-performance 
search  engine  library  written  in  Java.  Apache  Lucene  is  avail¬ 
able  at  http : //lucene . apache . org/core/ 


by  calculating  the  cosine  of  the  angle  between  their 
latent  topic  distribution  vectors. 

7  Performance  Evaluation 

For  our  experiments,  we  use  the  thirty  topics  au¬ 
thored  for  the  TREC-CDS  task  in  2014.  Twenty- 
six  groups  participated  in  the  TREC-CDS  task, 
yielding  a  total  of  71  fully-automated  systems  and 
11  “manual”  submissions  which  incorporated  user- 
intervention.  The  scientific  articles  collected  across 
all  85  submissions  were  pooled  and  manually  judged 
to  determine  their  relevance  the  associated  topic.  Su¬ 
pervised  by  the  Oregon  Health  &  Science  University 
(OHSU),  physicians  judged  the  twenty  top-ranked 
documents  returned  by  each  system  as  well  as  a  20% 
random  sample  of  the  retrieved  documents  between 
ranks  21  and  100,  yielding  a  total  of  37,949  topic- 
article  pairs.  In  our  experiments,  we  evaluate  the 
performance  achieved  on  this  task  according  to  these 
relevance  judgments. 

We  submitted  four  official  ’’runs”  to  the  2014 
TREC-CDS  task.  Our  first  system,  NQE,  captures 
the  performance  achieved  when  no  query  expansion 
is  performed.  In  this  way,  standard  Lucene-based 
BM25  retrieval  is  performed  based  on  the  key-phrases 
extracted.  Our  second  system,  LDA,  captures  the 
performance  when  Latent  Dirichlet  Allocation  (LDA) 
is  used  to  determine  the  relevance  between  a  medical 
topic  and  a  scientific  article.  Our  third  system,  W2V, 
captures  the  performance  when  Word2Vec  (W2V)  a 
state-of-the-art  distributional  semantic  approach  is 
used  to  expand  queries.  Our  fourth  system,  MIR, 
captures  the  performance  when  all  query  expan¬ 
sion  techniques  except  Word2Vec  (UMLS,  SNOMED, 
Wikipedia)  are  used  to  expand  each  medical  key- 
phrase. 

In  our  evaluations,  we  consider  four  different  per¬ 
formance  measures  of  the  quality  of  retrieved  scien¬ 
tific  articles.  The  inferred  Average  Precision  (infAP) 
estimates  the  expected  average  precision  for  a  topic 
when  only  a  subset  of  the  retrieved  articles  have  rel¬ 
evance  judgments  m-  Likewise,  the  inferred  Nor¬ 
malized  Discounted  Cumulative  Gain  (infNDCG)  ex¬ 
tends  the  commonly  used  NDCG  measure  to  account 
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Figure  2:  The  fifteen  most  likely  words  associated  with  selected  latent  topics  produced  by  the  Latent 
Dirichlet  Allocation  (LDA)  on  the  TREC-CDS  corpus  of  scientific  articles.  The  position  of  words  “loss”  and 
“memory”  has  been  highlighted  in  red  and  green,  respectively. 


System 

infAP 

infNDCG 

R-Prec 

P@10 

NQE 

0.056 

0.205 

0.170 

0.307 

LDA 

0.002 

0.028 

0.015 

0.027 

W2V 

0.044 

0.173 

0.137 

0.263 

MIR 

0.056 

0.201 

0.167 

0.327 

Table  2:  Performance  Evaluation  for  official  submis¬ 
sions  to  TREC-CDS  2014 


for  incomplete  relevance  judgments  m-  We  also  list 
the  R-Precision  (R-Prec)  denotes  the  precision  at  the 
R-th  position  in  the  ranking  results  where  R  is  the 
number  of  relevant  documents,  and  the  precision  of 
the  first  10  documents  (P@10).  Table [7] summarizes 
the  scores  provided  by  NIST  for  our  four  submissions. 
Additionally,  due  to  the  complexity  of  each  topic,  we 
present  the  performance  of  our  system  on  a  per-topic 
basis  in  Figure  [3j 


8  Analysis 

As  shown  in  Table  [7]  the  NQE  and  MIR  systems 
achieve  comparable  performance.  This  suggests  that 
incorporating  medical  knowledge  bases  does  not  yield 
a  significant  increase  in  the  quality  of  retrieved  sci¬ 
entific  articles  without  further  processing.  The  poor 
performance  of  the  Word2Vec  module  likewise  sug¬ 
gests  that  more  analysis  is  needed  to  ascertain  how 
many  expansions  should  be  considered,  and  how 
those  expansions  should  be  weighted.  Finally,  the 
LDA  baseline  performs  exceptionally  poorly,  indicat¬ 
ing  that  more  sophisticated  latent  topic  modelling 
approaches  should  be  considered. 

As  shown  in  Figure[3j  the  performance  of  our  three 
Lucene-based  systems  are  vary  similarly  between  top¬ 
ics.  Clearly,  our  best  performance  is  achieved  on 
Topic  4.  Our  weakest  performance  performance  is 
experienced  on  Topics  8  and  12,  which  will  be  anal¬ 
ysed  in  the  following  sub-sections. 
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Figure  3:  Per-topic  performance  comparison  of  all  four  submitted  systems  as  measured  by  the  inferred 
Average  Precision  (infAP). 


Key-phrase  Analysis 

The  key-phrase  detection  algorithm  we  have  de¬ 
scribed  in  Section  2.1  leverages  two  sources  of  infor¬ 
mation  to  determine  likely  key-phrases:  Wikipedia 
and  syntactic  noun  phrases.  This  allows  us  to  antic¬ 
ipate  sparsity  in  Wikipedia  and  mitigate  noise  from 
syntactic  analysis.  Additionally,  we  preempt  further 
noise  by  filtering  according  to  corpus  statistics  to 
remove  very  rare  or  highly  frequent  candidate  key- 
phrases  in  the  scientific  article  collection.  Despite 
these  measures,  errors  in  automatically  determining 
key-phrases  still  arise.  Consider,  for  example,  Topic 
8,  as  given  in  Table  [lj 

Our  key-phrase  extraction  system  extracts  the 
key-phrases  “memory  loss”,  “brain  MRI”,  “atro¬ 
phy”,  “gray  matter”,  “astrocytosis” ,  “neurologist”, 
“jerking”,  “lower  extremities”,  “neurologic  examina¬ 
tion”,  “cognitive  deficits”,  “electroencephalogram”, 
“sharp”,  “wave”,  “neuroimaging”,  “cerebral  atro¬ 
phy”  ,  and  “gray  matter” .  Although  all  these  phrases 
are  important  to  diagnosing  the  patient  described  in 
the  topic,  a  significant  amount  of  semantic  meaning  is 
lost  when  the  key-phrases  are  removed  from  their  con¬ 
texts.  For  example,  the  presence  of  the  term  “neurol¬ 
ogist”  is  unlikely  to  convey  the  same  impact  to  a  doc¬ 
ument’s  relevance  as  the  presence  of  “astrocytosis.” 
Likewise,  there  is  a  subjective  difficulty  in  determin¬ 
ing  the  ideal  boundary  for  each  extracted  keyphrase: 
should  a  system  consider  “astrocytosis”  or  “reactive 
astrocytosis”?  Is  “reactive  astrocytosis”  a  specific 
class  of  astrocytosis,  or  simply  a  description?  In  or¬ 
der  to  answer  these  types  of  questions,  more  medi¬ 


cal  information  must  be  considered  when  determing 
key-phrase  boundaries.  Unfortunately,  due  to  the  ex¬ 
traordinary  variation  possible  in  query  expansion  and 
relevance  model  implementations,  the  ideal  represen¬ 
tation  is  unlikely  to  be  objectively  true  for  all  sys¬ 
tems.  As  such,  it  is  difficult  to  measure  the  cost-vs- 
reward  of  incorporating  more  sophisticated  forms  of 
query  representation.  For  this  reason,  we  focus  pri¬ 
marily  on  the  impact  of  individual  query  expansion 
methods. 


Query  Expansion  Analysis 

Using  the  medical  key-phrase  “fracture”,  from  topic 
12,  it  is  clear  that  UMLS  and  SNOMED  provide  the 
largest  number  of  potential  expansions.  However, 
many  the  expansions  provided  by  UMLS  consist  of 
phrasal  expressions  (e.g.  “tired  all  the  time”)  which 
are  unlikely  to  appear  in  scientific  discourse.  Like¬ 
wise,  the  expansions  from  SNOMED  are  often  highly 
verbose  (e.g.  “complaining  of  debility  and  malaise”) 
and  need  further  language  processing  in  order  to  use 
in  fact,  it  might  be  reasonable  to  consider  these  ex¬ 
pansions  as  new  “queries”  with  their  own  individual 
key-phrases  (“debility”  and  “malaise”)  which  need 
further  expansion.  The  Wikipedia  expansions,  on  the 
other  hand,  are  much  more  concise  and  easier  to  lo¬ 
cate  within  the  scientific  article  collection.  Finally, 
the  Word2Vec  expansions  are  surprisingly  potent  - 
capturing  many  of  the  terms  yielded  by  structured 
knowledge  bases  such  as  “tiredness”  or  “malaise”. 
Unfortunately,  the  expansions  also  quickly  degener- 
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ate  into  co-occurring  words  which  are  not  directly 
related  to  the  medical  key-phrase. 

9  Conclusions 

The  Clinical  Decision  Support  (CDS)  track  in¬ 
troduced  in  the  2014  Text  REtrieval  Conference 
(TREC)  evaluated  the  problem  of  retrieving  scien¬ 
tific  articles  relevant  to  a  medical  case  description. 
This  entailed  a  wide  variety  of  difficulties,  such  as 
working  with  highly  complex  medical  texts  for  which 
each  term  requires  a  high  degree  of  domain  knowl¬ 
edge  to  comprehend,  and  the  lack  of  training  data 
imposed  by  the  fact  that  is  the  first  iteration  of  this 
track.  Given  these  difficulties,  we  considered  four 
approaches  to  this  task:  (1)  a  straightforward  BM25 
approach  using  Wikipedia-inspired  key-phrase  detec¬ 
tion,  (2)  an  extended  approach  incorporating  medi¬ 
cal  knowledge  bases  for  query  expansion,  (3)  a  fur¬ 
ther  extended  approach  which  additionally  incorpo¬ 
rates  unsupervised  corpus  statistics  based  on  distri¬ 
butional  word  embeddings,  and  (4)  a  separate  ap¬ 
proach  which  considers  the  cosine  similarity  between 
latent  topic  model  (LDA)  representations  of  queries 
and  documents.  Our  first  and  second  submissions 
achieved  promising  performance,  while  the  Word2Vec 
and  LDA-based  systems  were  the  weakest.  For  future 
improvement,  we  plan  to  investigate  how  to  properly 
incorporate  medical  knowledge  bases,  as  well  as  la¬ 
tent  knowledge  for  query  expansion. 
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